Skip to content

Optimze Gelu with MKL Erf function#15770

Merged
tensor-tang merged 10 commits intoPaddlePaddle:developfrom
yihuaxu:develop_a6910f900_gelu_mkl_opt
Feb 22, 2019
Merged

Optimze Gelu with MKL Erf function#15770
tensor-tang merged 10 commits intoPaddlePaddle:developfrom
yihuaxu:develop_a6910f900_gelu_mkl_opt

Conversation

@yihuaxu
Copy link
Contributor

@yihuaxu yihuaxu commented Feb 18, 2019

According to the performance status of Bert model, optimized GELU operator to accelerate the data processing.

Platform: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Model Path: third_party/inference_demo/bert_emb128/model
Batch Size: 1
Command: ./paddle/fluid/inference/tests/api/test_analyzer_bert --infer_model=third_party/inference_demo/bert_emb128/model/ --infer_data=third_party/inference_demo/bert_emb128/data.txt --gtest_filter=Analyzer_bert.profile --paddle_num_threads=1 --repeat=1 --batch_size=1 --test_all_data --profile
Data Source: third_party/inference_demo/bert_emb128/data.txt.

The following is the comparison with the different scenarios.

image

@yihuaxu yihuaxu force-pushed the develop_a6910f900_gelu_mkl_opt branch from 76eaa67 to 5f55ede Compare February 18, 2019 07:56
@luotao1 luotao1 added the Intel label Feb 18, 2019
@luotao1 luotao1 requested a review from tensor-tang February 18, 2019 08:35
@yihuaxu
Copy link
Contributor Author

yihuaxu commented Feb 19, 2019

start a review

SET(MKLML_SHARED_IOMP_LIB ${MKLML_LIB_DIR}/libiomp5md.dll)
ELSE()
SET(MKLML_VER "mklml_lnx_${TIME_VERSION}" CACHE STRING "" FORCE)
SET(MKLML_VER "VsErf_mklml_lnx_${TIME_VERSION}" CACHE STRING "" FORCE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment to show this is a temporary mklml lib including erf, like
TODO(intel-huying)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

template <typename T>
void VINV(int n, const T* a, T* y) const;

#ifdef PADDLE_WITH_MKLML
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add #ifdef here.

You can make this function general.

Can refer to VMUL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::memset(out_data, 0, n * sizeof(T));
math::CBlas<T>::AXPY(n, static_cast<T>(M_SQRT1_2), x_data, 1, out_data, 1);
math::CBlas<T>::VMERF(n, out_data, out_data, VML_LA);
for (int i = 0; i < n; i++) out_data[i] += static_cast<T>(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code style

for () {
  ...
}

same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@tensor-tang tensor-tang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yihuaxu
Copy link
Contributor Author

yihuaxu commented Feb 21, 2019

@panyx0718 Please help me review this PR because the key files are changed.

[10:23:00] + echo 'current pr 15770 got approvals: FALSE'
[10:23:00] + '[' FALSE == FALSE ']'
[10:23:00] + echo 'You must have panyx0718 approval for the api change! cmake/external'
[10:23:00] + exit 1
[10:23:07] Process exited with code 1

@luotao1 luotao1 requested a review from panyx0718 February 21, 2019 02:00
@tensor-tang
Copy link
Contributor

tensor-tang commented Feb 21, 2019

新的profile #15301 貌似有dependency问题 http://ci.paddlepaddle.org/viewLog.html?tab=buildLog&logTab=tree&filter=debug&expand=all&buildId=61983&_focus=9247

In file included from /paddle/paddle/fluid/platform/profiler.h:20:0,
[15:25:05]                 from /paddle/paddle/fluid/platform/device_tracer.cc:33:
[15:25:05]/paddle/paddle/fluid/platform/device_context.h:30:22: fatal error: mkldnn.hpp: No such file or directory
[15:25:05]compilation terminated.
[15:25:05]make[2]: *** [paddle/fluid/platform/CMakeFiles/device_tracer.dir/device_tracer.cc.o] Error 1
[15:25:05]make[1]: *** [paddle/fluid/platform/CMakeFiles/device_tracer.dir/all] Error 2
[15:25:05]make[1]: *** Waiting for unfinished jobs....

#15860 貌似fix 了,@yihuaxu 需要merge下最新。

@tensor-tang tensor-tang merged commit 676995c into PaddlePaddle:develop Feb 22, 2019
tensor-tang added a commit that referenced this pull request Feb 22, 2019
tensor-tang added a commit that referenced this pull request Feb 22, 2019
* Revert "Optimze Gelu with MKL Erf function (#15770)"

This reverts commit 676995c.

* test=develop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants